Constructing a Speech Translation System using Simultaneous Interpretation Data
نویسندگان
چکیده
There has been a fair amount of work on automatic speech translation systems that translate in real-time, serving as a computerized version of a simultaneous interpreter. It has been noticed in the field of translation studies that simultaneous interpreters perform a number of tricks to make the content easier to understand in real-time, including dividing their translations into small chunks, or summarizing less important content. However, the majority of previous work has not specifically considered this fact, simply using translation data (made by translators) for learning of the machine translation system. In this paper, we examine the possibilities of additionally incorporating simultaneous interpretation data (made by simultaneous interpreters) in the learning process. First we collect simultaneous interpretation data from professional simultaneous interpreters of three levels, and perform an analysis of the data. Next, we incorporate the simultaneous interpretation data in the learning of the machine translation system. As a result, the translation style of the system becomes more similar to that of a highly experienced simultaneous interpreter. We also find that according to automatic evaluation metrics, our system achieves performance similar to that of a simultaneous interpreter that has 1 year of experience.
منابع مشابه
Influence of pause length on listeners² impressions in simultaneous interpretation
We have been attempting to realize simultaneous machine interpretation. However, determining the interpreting utterance timing is as difficult as determining translation units. This remains a major concern for the development of such a speech translation system. It is also crucial for the system’s users that the speech generated by the system is clear and easy to listen to. In this paper, we fo...
متن کاملCorpus analysis of simultaneous interpretation data for improving real time speech translation
Real-time speech-to-speech (S2S) translation of lectures and speeches require simultaneous translation with low latency to continually engage the listeners. However, simultaneous speech-to-speech translation systems have been predominantly repurposing translation models that are typically trained for consecutive translation without a motivated attempt to model incrementality. Furthermore, the n...
متن کاملRole of pausing in text-to-speech synthesis for simultaneous interpretation
The goal of simultaneous speech-to-speech (S2S) translation is to translate source language speech into target language with low latency. While conventional speech-to-speech (S2S) translation systems typically ignore the source language acousticprosodic information such as pausing, exploiting such information for simultaneous S2S translation can potentially aid in the chunking of source text in...
متن کاملConstruction of Chunk-Aligned Bilingual Lecture Corpus for Simultaneous Machine Translation
Abstract With the development of speech and language processing, speech translation systems have been developed. These studies target spoken dialogues, and employ consecutive interpretation, which uses a sentence as the translation unit. On the other hand, there exist a few researches about simultaneous interpreting, and recently, the language resources for promoting simultaneous interpreting r...
متن کاملRapid development of speech translation using consecutive interpretation
The development of a speech translation (ST) system is costly, largely because it is expensive to collect parallel data. A new language pair is typically only considered in the aftermath of an international crisis that incurs a major need of crosslingual communication. Urgency justifies the deployment of interpreters while data is being collected. In recent work, we have shown that audio record...
متن کامل